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We present an implicit solvent model for ab initio electronic structure calculations 
which is fully self-consistent and is based on direct solution of the nonhomoge- 
neous Poisson equation. The solute cavity is naturally defined in terms of an iso- 
surface of the electronic density according to the formula of Fattebert and Gygi 
(J. Comp. Chem. 23, 6 (2002)). While this model depends on only two parame- 
ters, we demonstrate that by using appropriate boundary conditions and dispersion- 
repulsion contributions, solvation energies obtained for an extensive test set includ- 
ing neutral and charged molecules show dramatic improvement compared to existing 
models. Our approach is implemented in, but not restricted to, a linear-scaling den- 
sity functional theory (DFT) framework, opening the path for self-consistent implicit 
solvent DFT calculations on systems of unprecedented size, which we demonstrate 
with calculations on a 2615-atom protein-ligand complex. 
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The role of solvent is critical to a multitude of chemical, biological and physical processes. 
The accurate simulation of such processes, therefore, requires careful treatment of solvation 
effects. However, explicit inclusion of the solvent with full atomic detail is very costly due 
to the significant increase in the number of simulated atoms and the need for extensive 
averaging over the solvent degrees of freedon>i. Moreover, such explicit treatment may 
also be unnecessary, as it is often the long-range electrostatic effect of the solvent that is 
most significant, with only a small proportion of solvent molecules involved chemically. The 
implicit solvent approach addresses these issues by retaining only the atomic details of the 
solute, placed in a suitably defined cavity, and by representing the solvent environment by 
an unstructured dielectric continuum outside this cavity. The free energy of solvation is 
typically decomposed into two contributions - the electrostatic energy of interaction of the 
solvent with the polarized dielectric, and a nonpolar term accounting for the work required 
to create a cavity in the solvent (cavitation energy), and, in more complex models, for 
dispersion-repulsion interactions between the solute and solvent. 

A multitude of implicit solvent models of differing sophistication have been proposed to 
date^ for use in ab initio calculations. Many of these models are derived from the self- 
consistent reaction field (SCRF) formalism, where the effect of the electric field due to 
the dielectric (polarized by the solute) is included in the Hamiltonian in a self-consistent 
fashion. Two widely used classes of SCRF-type models are the polarizable continuum model 
(PCM) of Tomasi et al^ and the conductor-like screening model (COSMO) of Klamt and 
Schuurmann^. 

The shape of the cavity containing the solute varies between models - early models 
used spherical or elliptical cavities; in more recent models the cavity is usually constructed 
from overlapping atomic spheres of varying radii, which necessitates using a number of 
parameters. In contrast, the recent model proposed by Fattebert and Gygi^i^, and later 
developed by Scherlis et ali^ (henceforth called the FGS model), utilizes a dielectric cavity 
constructed directly from the electronic density of the solute, which greatly reduces the 
number of parameters involved. 

This "minimal-parameter" nature of the FGS model makes it attractive for ah initio 
calculations. However, this approach also has several shortcomings, which we address in this 
Letter. First, the original model did not include dispersion-repulsion effects, which, as the 
authors themselves note, is likely to impact its accuracy for larger neutral molecules. Second, 
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the model necessitates the use of an a posteriori correction to the energy in vacuum, obtained 
in periodic boundary conditions, to approximate open boundary conditions, whereas in the 
solvent the electrostatic energy is obtained subject to zero boundary conditions. Third, a 
severe numerical instability prevents this approach from being practical for large molecules. 

This Letter describes how we have addressed these limitations, by including dispersion- 
repulsion interactions, employing open (Coulombic) boundary conditions, and identifying 
and circumventing the root cause of the abovementioned numerical instability. We then 
validate and evaluate the performance of the model on two sets of several tens of small 
molecules. Finally, by performing a calculation on a 2615-atom protein- ligand system, we 
demonstrate how the implemented model can be used to perform large-scale ab initio cal- 
culations in solution. 

In contrast to other SCRF models where the solute cavity has a discontinuous boundary, 
the FGS model defines a smooth transition of the relative permittivity according to: 

(P (r) /poY 




e{r) = i+ 1 1 + . . : : , i , (i) 



where p (r) is the electronic density of the solute, £00 is the bulk permittivity, the parameter 
P controls the smoothness of the transition of e {r) from 1 to Eoo, and po is the density 
value for which the permittivity drops to eoo/2. The cavitation contribution to the free 
energy is assumed to be proportional to the surface area, S, of the cavity (calculated at 
p = Po), that is AGcav = iS{pq), where 7 is the solvent surface tension. Values for /3 
and Po are found by a least-squares fit to the hydration energies of ammonia, nitrate and 
methylammonium (representative of neutral, anionic and cationic molecules, respectively)^. 
The total potential of the solute in the presence of the dielectric, (r) is obtained by solving 
the nonhomogeneous Poisson equation 

V ■ {e [p] V0) = -47rptot (2) 

directly in real space subject to zero Dirichlet boundary conditions. The total charge density 
Ptot (r) is a sum of the electronic density p (r) and a Gaussian-smeared density of the cores, 
as proposed in ref. [t. 

As outlined in ref. Q, the fact that the dielectric cavity responds self-consistently to 
changes in the electronic density means that the functional derivative of the electrostatic 



energy, E^s, is no longer equal to the potential that is the solution of eq. ([2]), but rather: 

(3) 
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FIG. 1. Calculated free energies of solvation plotted against corresponding experimental values - 
a comparison between this work and other models. The solid diagonal line represents perfect agree- 
ment with experiment, the dashed black lines denote the estimated uncertainty of the experimental 
values. 



The original FGS model does not set out to address dispersion-repulsion effects. This 
makes the results obtained for larger molecules dubious, especially for those that are neu- 
tral, as then the electrostatic contribution to solvation would be dwarfed by the nonpolar 
terms. As the authors duly note, this deficiency already becomes evident for the case of 
benzene where this model predicts a AG of 7.9 kcal/mol^ whereas the experimental value is 
-0.87 kcal/mol^. To appreciate the magnitude of the problem we refer to fig. [T]and the top 
of table m where results obtained with the FGS model are shown for a representative selec- 
tion of 20 neutral, 20 cationic and 20 anionic molecules chosen from ref. |8|. The geometries 
of the molecules were not re-optimized in solvent, instead geometries optimized in the gas 



phase readily available from ref. |8| were used. The FGS model underestimates the solvation 
effect for anions and overestimates it for cations. The predictions for neutral species are in 
moderate agreement with experiment. By examining the coefficient of correlation between 
the calculated and experimental values, we demonstrate that the obtained values do not 
correlate well with experiment (with the notable exception of cations), which makes the 
calculation of relative free energies of solvation, AAG, unreliable. 

The second shortcoming of this approach is related to the boundary conditions used for 
the solution of eq. ([2]). References^i^ are concerned only with calculations in solution, where 
zero boundary conditions are used. Owing to the dielectric screening, this is a reasonable 
approximation, as long as the relative permittivity of the solvent is large. Ref.- uses the same 
approach in solution, whereas for the reference vacuum calculation (needed to obtain free 
energies of solvation), where the Poisson equation becomes homogeneous, standard periodic 
plane-wave DFT calculations are performed. In vacuo energies thus obtained are subse- 
quently corrected with the Makov-Payne formula^ to mimic the effect of open boundary 
conditions. This too is an approximation, since the correction cannot fully capture polar- 
ization effects^. Furthermore, only the energy is corrected, while the shape of the electronic 
density, and, in turn, the cavity generated in solution corresponds to periodic boundary con- 
ditions. As we demonstrate later, this subtly affects the free energies of solvation obtained 
for charged species, leading to a degree of cancellation of errors. 

Further, we point out the root cause of the numerical instability inherent in the FGS 
model. The second term in the RHS of eq. ([3]) is extremely difficult to evaluate accurately, 
because |^ is very close to zero everywhere, except on the boundary of the cavity, where, 
in turn, (V</'(r))^ is almost zero and thus difficult to distinguish from numerical noise. 
Because of this, the energy gradient calculated from eq. ([3]) is not numerically accurate and 
the method is found to converge only when high-order finite-differences and extremely fine 
grids (with a spacing of 0.15 or finer) are used, as only then the gradient of the potential 
can be evaluated to sufficient accuracy. The memory requirements necessitated by such fine 
grids quickly make the technique impractical for larger molecules. 

By addressing each of these limitations, we obtain a highly accurate and usable approach 
which retains the conceptual elegance of the FGS model. 

We solve eq. ([2]) by means of a second-order multigriciii>i^ approach, which is subsequently 
defect-corrected^ in an iterative fashion using 10-th order finite-difference stencils for the 
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TABLE I. Error (root mean square (rms) and maximum, in kcal/mol), with respect to experimenljSj 
in the calculated free energies of solvation, and the corresponding coefficient of correlation, r, 
between the calculated and experimental values, for the 20 neutral, 20 cationic and 20 anionic 
species studied. 



Approach 


xc 

functional 


neutral species 
rms err. max err. r 


cations 
rms err. max err. r 


anions 
rms err. max err. r 


FGS 


PBE 


5.0 


8.8 0.87 


9.7 


19.0 0.95 


19.5 


32.4 0.55 


this work^ 


PBE 


1.6 


2.8 0.93 


4.4 


10.2 0.95 


18.1 


29.5 0.53 


this work^ PBE 


5.0 


8.9 0.87 


10.4 


19.0 0.95 


21.2 


35.1 0.54 


this work_^ 


PBE 


1.8 


3.1 0.93 


3.9 


8.3 0.94 


18.1 


29.4 0.54 


PCM 


PBE 


4.9 


12.7 0.75 


10.5 


21.7 0.83 


17.8 


29.5 0.36 


PCM 


B3LYP 


4.7 


12.0 0.78 


10.4 


21.8 0.83 


17.0 


28.4 0.41 


PCM 


M05-2X 


4.4 


11.1 0.79 


10.2 


21.7 0.81 


15.7 


26.8 0.46 


SMD 


M05-2X 


0.9 


2.9 0.97 


4.6 


12.2 0.95 


8.5 


16.6 0.86 


AMBEfti^ 


(classical) 


3.3 


7.84 0.64 


6.9 


10.8 0.96 


12.8 


47.8 0.32 



With the cavity responding self-consistently to changes in density. 
^ With the cavity responding self-consistently to changes in density, and without dispersion-repulsion. 
With the cavity fixed. 



first and second derivatives. We find that with a grid spacing of 0.125 Oq as few as 3- 
4 defect-correction iterations are sufficient to reduce the algebraic error in the obtained 
potential by four orders of magnitude with respect to the initial, uncorrected solution. The 
corresponding reduction in the magnitude of the residual is two orders of magnitude, due 
to the approximate nature of the calculated boundary conditions. With a grid spacing of 
0.25 ten iterations, on average, were necessary. 

We have recast the solvation problem into open boundary conditions by computing the 
core-core and the local pseudopotential terms in real space and by using open (Coulombic) 
boundary conditions when solving eq. ([2]) - that is, we set up Dirichlet boundary conditions 
by evaluating the Coulombic potential due to the charge distribution ptot- Since with a 
spatially localized density the calculation of the Coulombic integral for all the points on 
the boundary scales as O {L'^N^.t) with the box length L and number of atoms N^^t, charge 
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coarse-graining and interpolation were used to reduce the prefactor in this calculation by 
about three orders of magnitude. In so doing, we obtain in vacuo energies and densities 
that need not be corrected. In the solvated case, where the nonhomogeneity in e prevents 
such an approach, we calculate the boundary conditions by approximating the dielectric as 
homogeneous, with the permittivity of the bulk solvent. Fig. [21 panels a) and b) and tabled 
rows 1 and 3 illustrate that for the simulation cell we used (a cube with an edge length of 
L = 40.5 ao) this alone offers no improvement in accuracy. A difference in the free energy of 
solvation of 0.1 kcal/mol, 0.9 kcal/mol and -1.8 kcal/mol is observed on average for neutral, 
cationic and anionic species, respectively, compared to the predictions of the original FGS 
model (refer to fig. [3] for details). For charged species the application of consistent open 
boundary conditions leads to a slight increase in the error. 




1.0 j} 2.0 1.0 p 2.0 1.0 p 2.0 



FIG. 2. Isolines of zero error with respect to experiment for the possible parametrizations of a) 
the FGS model, b) the FGS model with the boundary conditions we propose and c) the model 
proposed in this work, for three representative molecules. The dashed lines indicate an error of 
±1 kcal/mol. The black points indicate the final parametrization of the respective models. 

A parameter sweep for the three molecules used to parametrize the FGS model demon- 
strates (cf. fig. [2]) that there exists no parametrization that would result in even moderate 
agreement with experiment for the three species simultaneously. The model would consis- 
tently either underestimate free energies of solvation for anions or overestimate them for 
cationic species. 

We attribute this failure to a combination of factors - the poor performance of the Perdew- 
Burke-Ernzerhof (PBE) exchange-correlation (XC) functional; the fact that any isodensity 
formulation will use larger cavities for anions than for corresponding cations, whereas the 
charge assymmetry in solvation effects is in fact opposite^; and, finally, to the lack of 
inclusion of dispersion-repulsion effects, which leads to an overestimation of the nonpolar 
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FIG. 3. Absolute error in the calculated free energies of solvation with respect to experimental 
values, plotted against the surface area of the molecule. The surface areas differ between models 
because the parametrization is different. The solid and dashed lines represent a linear fit. The 
horizontal dotted line corresponds to perfect agreement with experiment. 



component of solvation. The middle rows of table |T] show, on the example of PCM, how 
using a hybrid functional such as B3LYP or a hybrid meta-GGA functional such as M05-2X— 
addresses the first problem, by reducing the self-interaction error, which otherwise leads to 
excessive delocalization of the electrons, but does not address the other two problems. 

The increase in the magnitude of the difference between the calculated and experimentally 
obtained free energies of solvation with the size of the molecule, especially in the case of 
neutral molecules, demonstrated in fig. [3] indicates that the neglect of dispersion-repulsion 
effects is detrimental to the predictive quality of the FGS model. We propose including 
dispersion-repulsion effects in the free energy of solvation, AGdis.rep, using an approximate 
relation derived by Floris et alM. Since this relation is linear, it amounts to a simple rescaling 
of the surface tension of the solvent, including the approximate AGdis.rop in the cavitation 
term. From the slope of the linear relation plotted in fig. 1 of ref.— it follows that the surface 
tension should be rescaled by a factor of 0.281. Even this crude method for taking dispersion- 
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repulsion into account dramatically improves the accuracy of the model, as evidenced by 
figs. [T] and |5] and table [U from which it is apparent that the resulting approach is in much 
better agreement with experiment than both PCM and the force-field Poisson-Boltzmann 
(PB) approach of AMBER~, offering comparable quality to the much more complex SMI>i^ 
model. The improvement offered by the inclusion of dispersion-repulsion effects is evidenced 
by fig. [2] and can be quantified by comparing rows denoted with a and h in table [H The 
results corresponding to the row denoted with h were obtained by turning off the dispersion- 
repulsion contribution whilst using the parameters proposed in ref.-, denoted with a point 
in fig. [21 panels a) and b). 

The numerical instability caused by the second term in the RHS of eq. ([3]) can be cir- 
cumvented without loss of accuracy. We first note that this term disappears when, instead 
of responding to changes in the electronic density, the dielectric cavity is fixed. We propose 
constructing the cavity by the application of eq. ([1]) to the converged electronic density of 
the solute obtained in the vacuum calculation and keeping the cavity fixed throughout the 
calculation in solvent. We show (cf. tables [H [TTl) that the associated reduction in accuracy 
is insignificant, while both the wall time and the memory requirements of the computation 
are reduced by about an order of magnitude, as convergence is readily achieved with a more 
reasonable real-space grid spacing of 0.25 Qq- We should point out that a similar attempt to 
fix the cavity in the FGS model would probably lead to larger errors due to the fact that 
the fixed cavity would come from the periodic density of the vacuum calculation - as the 
Makov-Payne correctionr- only corrects the energy. We note that this simplified approach is 
still suitable for geometry optimization in solution, provided the additional contribution to 
the forces due to the cavity variation with atomic positions is included. Sa et alM also note 
the abovementioned instability and propose a somewhat different way of circumventing it. 

We further validate our model on 71 neutral molecules taken from the blind tests of 
refs. 



20 



and 



21 



for which the experimental energies of solvation are reported in ref.-. Again, 
the geometries were not re-optimized in solution, but rather the gas-phase geometries from 
ref.- were used. The results, shown in fig. H] and table [TTl again show that our approach 
is consistently more accurate than both PCM and the force-field PB approach of AMBER— 
and that our model offers a level of agreement with experiment which is comparable to the 
SMDi^ model, even when the cavity is fixed. 

Conventional ab initio calculations are typically limited to only a few hundred atoms at 



TABLE II. Error (root mean square (rms) and maximum, in kcal/mol), with respect to experiment, 
in the calculated free energies of solvation, and the corresponding coefficient of correlation, r, 
between calculated and experimental values for the 71 molecules studie d^^'^^ . 







rms 


max 


Approach 


XC functional 


error 


error r 


this work^ 


PBE 


3.8 


8.3 0.83 


this work^ 


FEE 


4.1 


9.1 0.83 


PCM 


PBE 


10.9 


23.3 0.53 


SMD 


M05-2X 


3.4 


14.5 0.87 


AMBER 


(classical) 


5.1 


19.9 0.77 



With cavity responding self-consistently to changes in density. 
^ With cavity fixed. 
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FIG. 4. Calculated free energies of solvation plotted against corresponding experimental values 
for the 71 neutral molecules studied22i2i - a comparison between models. The solid diagonal 
line represents perfect agreement with experiment, the dashed black lines denote the estimated 
uncertainty in the experimental valus. 



most. However, with recent advances in linear-scaling density functional theory (LS-DFT) 
approaches'^ a number of codes^i^i^^ have been developed which are capable of performing 
calculations with many thousands of atoms. The combination of LS-DFT with implicit 
solvent models would enable higly realistic simulations of important phenomena such as 
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TABLE III. Free energies of solvation (in kcal/mol) of L99A/M102Q T4 lysozyme (AGhost), 
its complex with catechol (AGcpix), catechol (AGng), desolvation energy of the ligand (AGd = 
AGcpix — AGhost — AGiig), binding energy in vacuo (AE'gas) and in solvent (AE'soi), as predicted 
by our model (with PBE and a fixed cavity) and AMBER. 



Approach 


AGcplx 


AGhost 


AGhg 


AGd 


A-Bgas 


AE'soi 


this work 


-2423.0 


-2421.3 


-7.5 


5.8 


-28.6 


-22.8 


AMBER 


-2428.3 


-2433.0 


-17.6 


22.4 


-27.7 


-5.3 


expt.— 






-9.3 









biomolecular processes or the chemical modification and self-assembly of nanostructures. 

As a demonstration of the potential applications of this approach in large-scale DFT 
calculations, we have implemented our solvent model in the LS-DFT code ONETEP^i and 
used it to calculate the free energy of solvation for a 2615-atom system composed of the 
catechol ligand bound to a L99A/M102Q mutant of the T4 lysozyme. The results of this 
calculation are shown in table IIIII and the overview of the system in question, along with an 
outline of the dielectric cavity (at p = po) is shown in fig. IS AMBER greatly overestimates 
the solvation energy of catechol, and consequently the binding energy in solvent differs 
significantly between the two models. The need for an ab initio model, where the density is 
polarized by the solvent is demonstrated by the different behavior of AGhost (as compared 
to AGcpix) between the proposed model and AMBER. 




FIG. 5. An overview of the lysozyme-catechol complex, with the dielectric cavity indicated in grey 
and catechol in green. 

In summary, we have outlined and validated an implicit solvent model for ab initio cal- 
culations, which, despite using only two parameters, offers a substantial improvement over 
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existing models, as measured by the agreement of absolute and relative free energies of solva- 
tion with experiment (compared to PCM and AMBER) or the number of parameters needed 
to achieve similar agreement (compared to SMD). We have shown how the implementa- 
tion of the proposed model in the LS-DFT code ONETEP paves the way for first-principles 
implicit-solvent calculations for molecules with thousands of atoms. 
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